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09055558 INSPEC Abstract Number: C2004-09-6150N-131 
Title: Porting, monitoring and tuning UPC on NUMA architectures 
Author(s): Monamed, A.s. 

Author Affiliation: Dept. of Electr. & Comput. Eng., George Washington 
Univ. , DC, USA 



Conference Title: Proceedings of the International Conference on Parallel 
and Distributed Processing Techniques and Applications (PDPTA'2003) 
Part vol.4 p. 1518-25 vol.4 

Editor(s): Arabnia, H.R.; Mun, Y. 

Publisher: CSREA Press, Las Vegas, NV, USA 

Publication Date: 2003 Country of Publication: USA 4 vol. 1963 pp. 
Material Identity Number: XX-2003-03405 

conference Title: International Conference on Parallel and Distributed 
Processing Techniques and Applications (PDPTA'2003) 

Conference Date: 23-26 June 2003 Conference Location: Las Vegas, NV, 
USA 

Language: English Document Type: conference Paper (pa) 
Treatment: Practical (P) 

Abstract: We report on our experience in porting the NAS NPB benchmark 
using the recently developed GCC-SGI UPC compiler on the origin 03800 
NUMA machine. In fact, the SGI NUMA environment has provided new 
opportunities for UPC. For example, by coupling Unix P-threads with 
standard UPC threads one is able to code solutions to problems using 
pipelining, divide-and-conquer , and speculative parallel ization styles. 
This task-level parallelism was never before possible in UPC that relies 
mainly on distributed shared memory fine-grain data parallelism. This has 
led to having multithreads per processor and provided further opportunities 
for optimization through load balancing. The SGI cc-numa environment also 
provided memory consistency optimizations to mask the latency of remote 
accesses, convert aggregate accesses into more efficient bulk operations, 
and cache data locally. UPC allows programmers to specify memory accesses 
with "relaxed" consistency semantics. These explicit consistency "hints" 
are exploited by the CC-NUMA environment very effectively to hide latency 
and reduce coherence overheads further by, for example, allowing two or 
more processors to modify their local copies of shared data concurrently 
and merging modifications at synchronization points. This characteristic 
alleviates the effect of false sharing. Yet another opportunity that was 
made possible by the spectrum of performance analysis and profiler tools 
within the SGI NUMA environment is the development of new monitoring and 
tuning strategy that aims at improving the efficiency of parallel UPC 
applications. We are able to project the physically monitored parameters 
back to the data structures and high-level program, constructs within the 
UPC source code. This increases a programmer's ability to effectively 
understand, develop, and optimize UPC programs; enabling an exact analysis 
of a program's data and code layouts. Using this visualized information, 

Erogrammers are able to detect communication, data/threads layouts, and I/O 
ottlenecks and further optimize UPC programs with a better data and 
threads layouts potentially resulting in significant performance 
improvements. (8 Refs) 
subfile: c 
copyright 2004, IEE 
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08891455 INSPEC Abstract Number: C2004-04-6150G-036 

Title: Performance monitoring and evaluation of a UPC implementation on 
NUMA architecture 

Author(s): cantonnet, F.; Yao, Y.; Annareddy, s.; Mohamed, A.S.; 
El-Ghazawi , T.A. 

Author Affiliation: Dept. of Electr. & Comput. Eng., George Washington 
Univ. , DC, USA 

Conference Title: Proceedings International Parallel and Distributed 
Processing Symposium p. 8 pp. 

Publisher: IEEE Comput. Soc, Los Alamitos, CA, USA 



Publication Date: 2003 Country of Publication: USA CD-ROM pp. 



ISBN: 0 7695 1926 1 Material identity Number: XX-2003-00374 
Conference Title: International Parallel and Distributed Processing 

Symposium (IPDPS 2003) 

Conference Sponsor: IEEE Comput. Soc Tech. Committee on Parallel Process. 

; IEEE Comput. Soc. Tech. Committee on Comput. Archit.; IEEE Comput. Soc. 

Tech. Committee on Distrib. Process.; ACM SIGARCH 

Conference Date: 22-26 April 2003 Conference Location: Nice, France 
Language: English Document Type: Conference Paper (PA) 
Treatment: Applications (A); Practical (p) 

Abstract: UPC is an explicit parallel extension of ANSI c, which has been 
gaining rising attention from vendors and users, in this paper, we consider 
the low-level monitoring and experimental performance evaluation of a new 
implementation of the UPC compiler on the SGI Origin family of NUMA 
architectures. These systems offer many opportunities for the 
high-performance implantation of UPC They also offer, due to their many 
hardware monitoring counters, the opportunity for low-level performance 
measurements to guide compiler implementations. Early, UPC compilers 
have the challenge of meeting the syntax and semantics requirements of the 
language. As a result, such compilers tend to focus on correctness rather 
than on performance, in this paper, we report on the performance of 
selected applications and kernels under this new compiler. The measurements 
were designed to help shed some light on the next steps that should be 
taken by UPC compiler developers to harness the full performance and 
usability potential of UPC under these architectures. (13 Refs) 
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08075308 INSPEC Abstract Number: C2001-12-6150C-010 
Title: UPC benchmarking issues 
Author(s): El-Ghazawi, T. ; chauvin, S. 

Author Affiliation: Sch. of Comput. Sci . , George Mason Univ., Fairfax, 
VA, USA 

Conference Title: Proceedings International Conference on Parallel 
Processing p. 365-72 

Editor(s): Ni , L.M.; Valero, M. 

Publisher: IEEE Comput. Soc, Los Alamitos, CA, USA 
Publication Date: 2001 Country of Publication: USA xix+590 pp. 
ISBN: 0 7695 1258 5 Material Identity Number: XX-2001-02008 
U.S. copyright Clearance Center Code: 0190 3918/2001/$10 .00 
Conference Title: Proceedings International Conference on Parallel 
Processing 

Conference sponsor: int. Assoc. Comput. & Commun 

Conference Date: 3-7 Sept. 2001 Conference Location: Valencia, Spain 
Language: English Document Type: Conference Paper (PA) 
Treatment: Applications (A); Practical (P) 

Abstract: UPC, or Unified Parallel c, is a parallel extension of ANSI c. 
UPC is developed around the distributed shared-memory programming model 
with constructs that can allow programmers to exploit memory locality, by 
placing data close to the threads that manipulate them in order to minimize 
remote accesses. Under the UPC memory sharing model, each thread owns a 
private memory and has a logical association (affinity) with a partition of 
the shared memory. This paper discusses an early release of UPC Bench, a 
benchmark designed to reveal UPC compilers performance weaknesses to 
uncover opportunities for compiler optimizations. The experimental 
results from UPC Bench over the Compaq AlphaServer sc show that UPC Bench 
is capable of discovering such compiler performance problems. Further, it 
shows that if such performance pitfalls are avoided through compiler 
optimizations, distributed shared memory programming paradigms can result 
in high-performance, while the ease of programming is enjoyed. (11 Refs) 
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09776905 E.I. No: EIP04138082533 

Title: Low-level monitoring and high-level tuning of UPC on CC-NUMA 
architectures 

Author: Mohamed, Ahmed s. 

Corporate source: Department of Electrical Engineering George Washington 
University, Washington, DC 20052, United States 

Conference Title: Proceedings of the IASTED International Conference on 
Modelling, Sumulation and Optimization 

Conference Location: Banff, Alta.., Canada Conference Date: 
20030702-20030704 

Sponsor: IASTED, Technical committee on Modelling simulation 

E.I. Conference No.: 62482 

source: Proceedings of the IASTED international Conference on Modelling, 
simulation and Optimatization 2003. 
Publication Year: 2003 
ISBN: 0889863725 
Language: English 

Document Type: CA; (Conference Article) Treatment: T; (Theoretical); X; 
(Experimental) 

Journal Announcement: 040 3W5 

Abstract: We experiment with various techniques of monitoring and tuning 
UPC programs while porting NAS NPB benchmark using the recently developed 
GCC-SGI UPC compiler on the origin O3800 NUMA machine. The performance 
of the NAS NPB on the SGI NUMA environment is compared to previous NAS NPB 
statistics on a Compaq multiprocessor, in fact, the SGI NUMA environment 
has provided new opportunities for UPC. For example, the spectrum of 
performance analysis and profiler tools within the SGI NUMA environment 
made the development of new monitoring and tuning strategies that aim at 
improving the efficiency of parallel UPC applications possible. Our 
objective is to be able to project the physically monitored parameters 
back to the data structures and high-level program constructs within the 
source code. This increases a programmer's ability to effectively 
understand, develop, and optimize programs; enabling an exact analysis of 
a program's data and code layouts. Using this visualized information, 
programmers are able to further optimize UPC programs with a better data 
and threads layouts potentially resulting in significant performance 
improvements. Furthermore, the SGI CC-NUMA environment provided memory 
consistency optimizations to mask the latency of remote accesses, convert 
aggregate accesses into more efficient bulk operations, and cache data 
local Ty. UPC allows programmers to specify memory accesses with "relaxed" 
consistency semantics. These explicit consistency "hints" are exploited by 
the cc-NUMA environment very effectively to hide latency and reduce 
coherence overheads further by allowing, for example, two or more 
processors to modify their local copies of shared data concurrently and 
merging modifications at synchronization operations. This characteristic 
alleviates the effect of false sharing. 4 Ref s . 
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Title: A Performance Analysis of the Berkeley UPC Compiler 

Author: Chen, Wei-Yu; . Bonachea, Dan; Duell, Jason; Husbands, Parry; 
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Corporate Source: Computer Science Division University of California, 
Berkeley, CA, United states 

Conference Title: 2003 International Conference on Supercomputing 

Conference Location: San Francisco, CA, United States Conference Date: 
20030623-20030626 

Sponsor: ACM/SIGARCH ; Intel Corporation; Florida state University 

E.I. Conference No.: 62275 

Source: Proceedings of the international Conference on Supercomputing 
2003. p 63-73 

Publication Year: 2003 
Language: English 

Document Type: CA; (conference Article) Treatment: T; (Theoretical) 
Journal Announcement: 040 3W2 

Abstract: Unified Parallel C (UPC) is a parallel language that uses a 
Single Program Multiple Data (SPMD) model of parallelism within a global 
address space. The global address space is used to simplify programming, 
especially on applications with irregular data structures that lead to 
fine-grained sharing between threads. Recent results have shown that the 
performance of UPC using a commercial compiler is comparable to that 
of MPI left bracket 7 right bracket . In this paper we describe a portable 
open source compiler for UPC . our goal is to achieve a similar 
performance while enabling easy porting of the compiler and runtime, and 
also provide a framework that allows for extensive optimizations, we 
identify some of the challenges in compiling UPC and use a combination 
of micro-benchmarks and application kernels to show that our compiler has 
low overhead for basic operations on shared data and is competitive, and 
sometimes faster than, the commercial HP compiler. We also investigate 
several . communication optimizations, and show significant benefits by 
hand-optimizing the generated code. 22 Ref s . 
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00842542 20030128028B3862 (USE FORMAT 7 FOR FULLTEXT) 

Etnus Announces Totalview 6.0, With support for New Compilers, Platforms, 
and Expanded C++ Support; Feature List Includes Much-Anticipated Linux 
Compiler, IBM Regatta, and Sun 64-bit support 
Business wire 

Tuesday, January 28, 2003 08:00 EST 

JOURNAL CODE: BW LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT 
DOCUMENT TYPE: NEWSWIRE 
WORD COUNT: 538 

...the Intel C/C++ 7.0 for Linux and Intel Fortran 7.0 for Linux compilers 

Version 6 also supports the Unified Parallel c (UPC) programming 
model , which 

has been adopted over the last year by a consortium from... 
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00732429 20020618169B0044 (USE FORMAT 7 FOR FULLTEXT) 
HP Announces industry's First UPC Compiler for Commercial Use 
Business Wire 

Tuesday, June 18, 2002 08:59 EDT 

JOURNAL CODE: BW LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT 
DOCUMENT TYPE: NEWSWIRE 
WORD COUNT: 591 

HP Announces Industry's First UPC compiler for Commercial Use 
TEXT: 

HP (NYSE : HPQ) today, 
announced the release of its newly developed UPC compiler for Tru64 
UNIX, the 

first commercial release of a UPC compiler in the industry and a 
technological 

breakthrough for the high-performance technical computing market. 

The HP UPC Compiler V2.0 (formerly the Compaq UPC Compiler ) is a 
fully 

complete implementation of the Unified Parallel C language as well as 
highly... 

...Professor Katherine Yelick, University of 

California at Berkeley and Lawrence-Berkeley National Lab. "The HP 
compiler is 

the most sophisticated UPC compiler currently available. It implements 
the 

full UPC specification and provides application-level access to the... 
...parallel applications and excellent 

performance across shared memory, distributed memory and hybrid systems. 

The HP UPC Compiler V2.0 is currently running at 16 large sites on 
three 

continents, including Lawrence Livermore... 



...in Australia, as well as at two large 
intelligence agencies and several universities. 

The HP UPC Compiler V2.0 is now available and priced from US$3,750 to 
US$80,000... 

...number of CPUs required to execute the run-time 
code . 

More details about the HP UPC Compiler are available at 
http : //www. t ru64uni x . Compaq . com/upc/ . 

About HP 

HP is a leading. . . 
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HP Announces Industry's First UPC Compiler for commercial Use 

Business Wi re 

Tuesday, June 18, 2002 08:59 EDT 

JOURNAL CODE: BW LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT 
DOCUMENT TYPE: NEWSWIRE . 
WORD COUNT: 591 

TEXT : 

PALO ALTO, Calif., Jun 18, 2002 (BUSINESS WIRE) - HP (NYSE:HPQ) today 
announced the release of its newly developed UPC compiler for Tru64 
UNIX, the 

first commercial release of a UPC compiler in the industry and a 
technological 

breakthrough for the high-performance technical computing market. 

The HP UPC Compiler V2.0 (formerly the Compaq UPC Compiler ) is a 
fully 

complete implementation of the unified Parallel C language as well as 
highly 

scalable and extremely high performing. 

Developed under a joint research agreement with the U.S. National Security 
Agency, it is the only implementation of UPC with independent 
documentation, 

run-time validation and tuning parameters and it supports all features in 
the 

official UPC language specification. 

"UPC is a new parallel variant of the C language that holds great promise 
as a 

means of simplifying the task of coding parallel programs while ensuring 
efficient execution," said Professor Katherine Yelick, University of 
California at Berkeley and Lawrence-Berkeley National Lab. "The HP 
compiler is 

the most sophisticated UPC compiler currently available. It implements 
the 

full UPC specification and provides application-level access to the 
low-latency Quadrics interconnect. It also performs caching and 
pre-f etching 

optimizations that allow programs written in a simple style to obtain high 
performance . " 

UPC provides a simple shared memory model for parallel programming, 
al 1 owi ng 

data to be shared or distributed among a number of communicating 
processors. 

This model promises easier coding of parallel applications and excellent 
performance across shared memory, distributed memory and hybrid systems. 

The HP UPC Compiler V2.0 is currently running at 16 large sites on 
three 

continents, including Lawrence Livermore National Laboratory in California, 
the Pittsburgh supercomputing Center in Pennsylvania and the Victorian 
Partnership for Advanced Computing in Australia, as well as at two large 
intelligence agencies and several universities. 

The HP UPC Compiler V2.0 is now available and priced from US$3,750 to 
US$80,000 depending on the number of CPUs required to execute the run-time 



code . 

More details about the HP UPC Compiler are available at 
http : //www. t ru64uni x . Compaq . com/upc/ . 

About HP 

HP is a leading global provider of products, technologies, solutions and 
services to consumers and businesses. The company's offerings span IT 
infrastructure, personal computing and access devices, global services and 
imaging and printing. HP merged with Compaq Computer Corp. on May 3, 2002. 
The 

merged company had combined revenue of approximately $81.7 billion in 
fiscal 

2001 and operations in more than 160 countries. More information about HP 
is 

available at http://www.hp.com. 

UNIX is a registered trademark of the open Group. 

This news release contains forward-looking statements that involve risks, 
uncertainties and assumptions. All statements other than statements of 
historical fact are statements that could be deemed forward-looking 
statements. Risks, uncertainties and assumptions include the possibility 
that 

the market for the sale of certain products and services may not develop as 
expected; that development of these products and services may not proceed 
as 

planned; and other risks that are described from time to time in HP's 
Securities and Exchange Commission reports, including but not limited to 
HP's 

annual report on Form 10-K, as amended on January 30, 2002, for the fiscal 
year ended October 31, 2001, HP's quarterly report on Form 10-Q for the 
quarter ended January 31, 2002 (as filed with the SEC on March 12, 2002) 
and 

subsequently filed reports, if any of these risks or uncertainties 
materializes or any of these assumptions proves .incorrect, HP's results 
could 

differ materially from HP's expectations in these statements. HP assumes no 
obligation and does not intend to update these forward-looking statements. 

CONTACT: HP 

Dick calandrella, 508/467-2261 
di ck . cal andrel 1 a@hp . com 

URL : http : / /www . busi nesswi re . com 
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